We can think of a logistic regression as model in which individuals have a latent trait that determines, given a threshold, if they fall in one category or another. For instance, latent scholastic aptitude is a trait that (co-) determines if a student passes a class:
threshold = -2
set_par(cex = 1.5)
curve(dlogis(x),-6,6, xlim = c(-5,5),
xlab = "latent scholastic aptitude")
x = seq(threshold,5,.1)
shade(dlogis(x)~x,c(threshold,6), col = adjustcolor("green3",.5))
x = seq(-6,threshold,.1)
shade(dlogis(x)~x,c(-6,threshold), col = adjustcolor("red3",.5))
abline(v = threshold, lty = 2, lwd = 2)
In this model, the latent variable is distributed according to the logistic distribution.
We can calculate the log odds of an outcome as before, this time using probabilities.
\[ \textrm{log odds} = log \left( \frac{P_{fail}}{P_{pass}} \right) = log \left( \frac{P_{fail}}{1-P_{fail}} \right) \] To get the probability to fail at a certain threshold, we use the cumulative probability function of the logistic distribution, also called the inverse logit.
In R we can use boot::inv.logit or plogis
to get the cumulative density function of the logistic distribution.
set_par(cex = 1.5)
curve(plogis(x),-6,6,xlim = c(-5,5),
xlab = "latent scholastic aptitude",
ylab = "plogis(x) = inv_logit(x)")
curve(plogis(x),-6,-2,col = "red", add = T, lwd = 1.5)
curve(plogis(x),-2,6,col = "green3", add = T, lwd = 1.5)
lines(c(-6,threshold),rep(plogis(threshold),2), lty = 2, col = "red")
lines(rep(threshold,2),c(0,plogis(threshold)), lty = 2, col = "red")
abline(h = c(0,1), lty = 2)
P_fail = plogis(threshold) # fail prob. from thresh. (-2)
P_pass = 1-P_fail # Pass probability
log_odds = log(P_fail/P_pass) # calculate log odds
log_odds # log odds = threshold!
## [1] -2
This is exactly our threshold and explains why the intercept
coefficient of an intercept-only logistic regression for data with a
fail-rate of 12% will be qlogis(.12) = -2.
The following figure shows that we can use the cumulative
distribution function (plogis or
inv.logit) to go from a threshold value to success
probabilities and the quantile function (qlogis or
logit) to go from success probabilities to thresholds.
set_par(mfrow = c(1,2), cex = 1.5)
curve(plogis(x),-5,5, xlab = "threshold", ylab = "plogis(x) or inv.logit(x)")
arrows(x0 = threshold, y0 = 0, y1 = plogis(threshold), col = "blue", lwd = 2, length = .2)
arrows(x0 = threshold, x1 = -5.4, y0 = plogis(threshold), col = "blue", lwd = 2, length = .2)
text(-5.4,plogis(threshold),labels = round(plogis(threshold),2), pos = 2, col = "blue", xpd = TRUE, cex = .75)
abline(h = c(0,1), lty = 2)
curve(qlogis(x),0,1, xlab = "cumulative probability", ylab = "qlogis(x) or logit(x)")
arrows(x0 = plogis(threshold), y0 = -5, y1 = threshold, col = "blue", lwd = 2, length = .2)
arrows(y0 = threshold, x0 = plogis(threshold), x1 = -0.04, col = "blue", lwd = 2, length = .2)
text(-5.4,plogis(threshold),labels = round(plogis(threshold),1), pos = 2, xpd = TRUE)
text(plogis(threshold),-5.4,labels = round(plogis(threshold),2), pos = 1, col = "blue", xpd = TRUE, cex = .75)
abline(v = c(0,1), lty = 2)
Now let us assume that there is another variable \(\small X\) that increases the probability to pass the class, so that also students with lower latent scholastic aptitude of -3 can pass a class (could be class difficulty).
In a logistic regression we estimate the effect of \(\small X\) as follows:
\[ log \left( \frac{P_{fail}}{P_{pass}} \right) = \alpha + \beta \cdot X \]
Here, \(\small \alpha\) is the intercept and baseline log-odds to pass the test (due to latent scholastic aptitude) and \(\small \beta\) captures the change in log odds of passing the class due to \(\small X\). Because the log odds can also be understood as thresholds on top of a logistic-distributed latent variable, you can also think of \(\small \alpha\) and \(\small \beta\) together determining the threshold value for passing and the thus the probability to pass.
Because our baseline threshold / log odds is -2 and the threshold for children with variable X is -3, we calculate beta as the difference of the two thresholds:
\[ \beta = -3-(-2) = -1 \]
So if we simulate data according to this data generating process and estimate a logistic regression, we should find that \(\small \alpha = -2\) and \(\small \beta = -1\).
Lets simulate the data:
set.seed(1)
N = 10000
thresholds = rep(-2,N)
X = sample(c(0,1), N, replace = TRUE)
thresholds[X==1] = -3
trait_values = rlogis(N)
pass = trait_values > thresholds
We estimate the model:
q.fit = quap(
alist(
pass ~ dbinom(1,p),
logit(p) ~ -(a + b*X),
a ~ dnorm(0,3),
b ~ dnorm(0,3)
),
data = list(pass = pass, X = X)
)
precis(q.fit) %>%
round(2)
## mean sd 5.5% 94.5%
## a -1.96 0.04 -2.03 -1.89
## b -1.07 0.08 -1.19 -0.94
Due to simulation noise when generating trait_values we
are not exactly recovering the parameters, but we are getting very close
to the correct thresholds with -1.96 at baseline and a + b = -1.96 +
-1.07 = -3.03 for individuals where X = 1.
This shows us again that regression weights in logistic
regressions represent changes in log odds ratios, and thus also changes
in the threshold or success probability, due to a predictor
variable. As we discussed in the last class, these changes are
due to the exp in the link function multiplicative on the
probability scale.
The ordinal logistic regression is named as such, because just like logistic regression, it assumes a latent logistic variable that determines responses. Differently than the standard logistic regression, ordered logistic regression uses not one but multiple thresholds in order to map a latent trait (variable) onto ordered responses.
Lets visualize this with 3 thresholds of our choice for \(\small M=4\) categories.
thresholds = c(-1.24,.25,2)
thresholds
## [1] -1.24 0.25 2.00
plot_dlogis.thresh = function(threshs, xlim = c(-5,5), plot.text = TRUE) {
threshs = as.numeric(threshs)
x.st = xlim[1]-1
x.en = xlim[2]+1
curve(dlogis(x),x.st,x.en,xlim = xlim,
xlab = "latent scholastic aptitude")
n_cat = length(threshs) + 1
clrs =
colorRampPalette(c("blue","blue4"))(n_cat)
for (k in 1:n_cat) {
st = ifelse(k == 1, x.st, threshs[k-1])
en = ifelse(k == n_cat, x.en, threshs[k])
x = seq(st,en,length.out = 50)
polygon(c(min(x),x,max(x)), c(0,dlogis(x),0), col = clrs[k])
arrows(x0 = threshs[k], y0 = 0, y1 = dlogis(0), lty = 2, col = "red", lwd = 2,length = 0)
if (plot.text == TRUE) {
text((st+en)/2,dlogis(0)/2,paste0("Y=",k), col = "grey50", cex = 1.5)
text(threshs[k],dlogis(0),round(threshs[k],2), col = "red", pos = 2)
}
}
}
n_cat = length(thresholds) + 1
clrs = colorRampPalette(c("blue","blue4"))(n_cat)
set_par(cex = 1.5)
plot_dlogis.thresh(thresholds)
Just as we can use the (cumulative) logistic distribution to recover threshold values (log odds) for the simple logistic regression, we can use it to recover thresholds for an ordered logistic model.
For instance, when 22.4% of individuals respond with Y = 1:
threshold_1 = log(.224/(1-.224)) # note the log odds
threshold_1 %>% round(2)
## [1] -1.24
As you might have guessed, we can also read the cumulative probabilities to give a response up to a level \(\small k\) given \(\small k-1\) thresholds from the cumulative logistic distribution.
Then, we obtain a categorical distribution by calculating the probability of a response as the difference of the cumulative probabilities at the thresholds that enclose that category. This categorical distribution that is used to calculate the likelihood for ordinal responses.
plot_plogis.thresh = function(thresholds, plot.thrsh.eq = FALSE, xlim = c(-5,5)) {
x.st = xlim[1]-1
x.en = xlim[2]+1
n_cat = length(thresholds) + 1
clrs =
colorRampPalette(c("blue","blue4"))(n_cat)
curve(plogis(x),x.st,x.en,xlim = c(-5,5),
xlab = "latent scholastic aptitude")
for (k in 1:n_cat) {
s = ifelse(k == 1, x.st, thresholds[k-1])
e = ifelse(k == n_cat, x.en, thresholds[k])
curve(plogis(x), s,e, col = clrs[k], add = T, lwd = 2)
b = ifelse(k == 1, 0, plogis(thresholds[k-1]))
t = ifelse(k == n_cat, 1, plogis(thresholds[k]))
x = seq(s,e,.01)
y.p = c(plogis(x), plogis(max(x)))
x.p = c(x,x.en)
polygon(c(x.p,x.en),c(y.p,plogis(x[1])), col = clrs[k], border = "NA")
arrows(x0 = -4, y1 = t, y0 = b, angle = 90, code = 3, length = .15)
prob.level =
ifelse(
k == 1,
plogis(thresholds[k]),
ifelse(
k == n_cat,
1-plogis(thresholds[k-1]),
plogis(thresholds[k])-plogis(thresholds[k-1]))) %>%
round(2)
text(-4, (b+t)/2, paste0("P(Y=",k,")=", prob.level), pos = 4, cex = .8)
if (k < n_cat) {
p=plogis(thresholds[k])
thrsh = log(p/(1-p)) %>% round(2)
thrsh.equ = bquote(frac(.(round(p,2)),.(round(1-p,2)))~"=exp("~.(thrsh)~")")
if (plot.thrsh.eq == TRUE) text(5.5,p,thrsh.equ, xpd = T, pos = 4, cex = .8)
arrows(x0 = thresholds[k], y0 = 0, y1 = p, col = "red", lty = 2)
arrows(x0 = thresholds[k], x1 = -5.4, y0 = p, col = "red", lty = 2)
}
}
points(thresholds,rep(0,n_cat-1), pch = "|", col = "red")
}
set_par(cex = 1.5, mar = c(3,3,.5,7))
plot_plogis.thresh(thresholds, plot.thrsh.eq = TRUE)
Let’s again simulate data from the model and estimate the thresholds.
N = 1000
Y = cut(
rlogis(1000), # latent variable
breaks = c(-Inf,thresholds,Inf)) %>% # thresholds
as.numeric()
which we can show as histogram and as cumulative counts:
set_par(mfrow = c(1,2))
Y %>%
table() %>%
barplot(ylab = "N",
col = clrs,
xlab = "Response")
m = diag(4)
m[upper.tri(m)] = 1
m = apply(m,2, function(x) x*Y %>% table())
x = barplot(m, col = clrs,
ylab = "cumulative N",
xlab = "Response",
names.arg = 1:n_cat)
ps = c(plogis(thresholds[1]),diff(c(plogis(thresholds),1)))
cs = c(0,plogis(thresholds),1)
for (k in 1:n_cat)
text(tail(x,1),
mean(cs[k:(k+1)])*N,
paste(round(ps[k]*100),"%"),
col ="white",
cex = .5)
Here is an ordered logistic regression model:
bol.model =
alist(
Y ~ dordlogit(0,thresholds),
thresholds ~ dnorm(0,3)
)
The key difference between the simple logistic and the cumulative ordered logistic regression with responses \(\small Y_i \in \{0,1\}\) is that the former has only one threshold parameter:
\[ log \left( \frac{P(Y_i = 1)}{P(Y_i = 0)} \right) = \alpha \]
and the latter has, given responses \(\small Y_i \in \{1, 2, ..., M\}\) in ordered categories, \(\small M-1\) threshold parameters:
\[ log \left( \frac{P(Y_i>k)}{P(Y_i<k)} \right) = \alpha_k \]
with responses \(\small k \in \{1,2...M\}\)
The dordlogit function in the rethinking package
calculates the likelihood of the observations given the model by
using the plogis / inv_logit link functions
and threshold values to calculate likelihoods from a categorical
distribution.. The implicit latent variable remains unchanged,
but thresholds are adjusted to maximize the posterior probability of the
data.
set_par(mfrow = c(2,2))
thresh_list = list(thresholds,thresholds-1.25)
for (k in 1:2) {
thres = thresh_list[[k]]
plot_plogis.thresh(thres)
rbind(Y %>%
table %>%
prop.table(),
rlogis(100) %>%
cut(c(-Inf,thres,+Inf)) %>%
table %>% prop.table()) %>%
barplot(beside = T,
col = c("grey50","blue"),
ylab="proportion",
xlab = "Response")
legend("topleft", fill = c("grey50","blue"),
legend = c("data",expression("model | "~theta)), bty = "n")
}
Moving the thresholds changes the probability / likelihood to observe
the different categories.
The basic ordered logistic regression model without predictors finds the threshold values that allow to reproduce the observed distribution of ratings.
We fit the model.
bol.fit = ulam(
bol.model,
data=list(Y=Y),
chains=4,cores=4)
And here are the estimated thresholds, with the true thresholds as vertical, dotted, blue lines.
precis(bol.fit, depth = 2) %>% plot()
threshold.est = precis(bol.fit, depth = 2)[,1]
text(threshold.est,
3:1,
round(threshold.est,2), pos = 3)
abline(v = thresholds,col = "blue", lty = 3)
As expected, we recover the correct threshold values -1.24, 0.25, 2 (with some error/bias due to simulation and prior).
To understand how to set up a model such that a variable can increase or decrease the probability of higher response levels, lets look at the plot of latent variable and thresholds again. The plot also shows a hypothetical person with an average latent scholastic aptitude of 0 as a white dot.
set_par(cex = 1.5)
plot_dlogis.thresh(thresholds)
plotrix::draw.circle(0,.005,radius = .1,col = "white")
If we want to increase the chance that this hypothetical person achieves a rating of Y = 3, we can either increase the value of the latent variable or shift the the thresholds to the left:
set_par(mfrow = c(1,2),cex = 1.25)
plot_dlogis.thresh(thresholds)
arrows(x0 = 0, x1 = 1, y0 = .005, length = .1, col = "yellow")
plotrix::draw.circle(1,.005, radius = .1,col = "white")
plotrix::draw.circle(0,.005, radius = .1,col = adjustcolor("white",alpha = .75))
thresholds.b = thresholds - 1
plot_dlogis.thresh(thresholds.b)
abline(v = thresholds, col = adjustcolor("red",alpha = .25), lty = 2)
arrows(x0 = thresholds, x1 = thresholds.b,
y0 = c(seq(.025,.075,length.out = 3)),
col = "yellow", length = .1)
plotrix::draw.circle(0,.005,radius = .1,col = "white")
If we shift the latent value or the thresholds by the same amount, the probability of a higher response rises equally. In both plots above the bright white dot is in in the middle between the 2|3 and 3|4 thresholds.
Therefore we can, as we saw earlier for the logistic regression, add terms for individual level effects to the basic threshold / intercept only model to capture the effect of predictor variables on responses:
\[ log \left( \frac{P(Y_i>k)}{P(Y_i<k)} \right) = \alpha_k + \beta \cdot X_i \] where \(\small \alpha_k\) are thresholds and \(\beta\) captures the effect of the variable \(X\) by modifying each individual’s threshold.
Now we have seen previously that a threshold is just the log odds of responding in a category above vs below the threshold. Therefore, a positive (negative) regression weight in an ordinal logistic regression should be interpreted as the log odds ratio to give a one level higher (lower) response, i.e. Y = 4 instead of Y = 3 (Y = 2 instead of Y = 3).
Because there is only one regression weight \(\small \beta\) per variable that is added to all thresholds, the log odds ratios are the same for all category transitions, i.e.
\[ \small log \left( \frac{P(Y_i>1|X_i = 0)}{P(Y_i<1|X_i = 1)} \right) = log \left( \frac{P(Y_i>2|X_i = 0)}{P(Y_i<2|X_i = 1)} \right) = log \left( \frac{P(Y_i>3|X_i = 0)}{P(Y_i<3|X_i = 1)} \right) \]
This is referred to as the proportional odds assumptions, which may or may not be true for the data at hand. There are alternative models that do not make this assumption. Here is a good paper that describes these models: Ordinal Regression Models in Psychology: A Tutorial
To see the ordered logistic regression in action, we’ll use the example data from the Portuguese school1. In particular, we are looking at the final grade and estimate again the association with maternal education.
df=read.table("../Chapter11/data/student-mat.csv",sep=";",header=TRUE)
df = df[df$Medu>0,]
set_par()
df$G3 = as.numeric(cut(df$G3, breaks = seq(0,20,2),include.lowest = T))
df$G3 %>% table() %>% barplot()
Yes, cumulative ordinal logistic regressions can also handle multimodal
data! But there are limits to this ability.
To recapitulate that thresholds are just log odds at category boundaries, lets calculate them:
# 1. Cumulative probability of all classes
cum_prob = df$G3 %>% table() %>% prop.table() %>% cumsum()
cum_prob = cum_prob[cum_prob!=1]
# calculate logg odds
simple_thresholds = log(cum_prob/(1-cum_prob))
simple_thresholds %>% round(2)
## 1 2 3 4 5 6 7 8 9
## -2.23 -2.20 -1.69 -1.04 -0.11 0.71 1.51 2.73 4.16
And we estimate a thresholds only model with ulam:
tm.fit = ulam(
alist(
G3 ~ dordlogit(0,thresholds),
thresholds ~ dnorm(0,3)),
data=list(G3 = df$G3),
chains=4,cores=4)
Now we can plot the manually calculated thresholds against those
estimated with ulam:
post = extract.samples(tm.fit)
ulam_thresholds = colMeans(post$thresholds)
CI = apply(post$thresholds, 2, PI)
set_par(cex=1.5)
plot(simple_thresholds,
ulam_thresholds,
ylim = range(CI),
pch = 16, col = "blue")
arrows(x0 = simple_thresholds,
y0 = CI[1,], y1 = CI[2,],
col = "blue", length = 0)
abline(0,1,lty = 3, col = "grey")
Now that we have understood thresholds, we can implement a model that uses predictors. Specifically, we estimate the effect of past failure and maternal education on grades and allow an interaction between these two variables. Ordinal logistic regressions allow differences in the magnitude of slopes between groups without modelling interactions explicitly. But we do need to model interactions explicitly if we want to allow slopes with different signs for different groups.
ol.model =
alist(
G3 ~ dordlogit(phi,thresholds),
phi <- bE*(Medu-2.5) + iFE*failures,
iFE <- bF + bFE*(Medu-2.5), # interaction
c(bF, bE, bFE) ~ normal(0,1),
thresholds ~ dnorm(0,3)
)
We fit the model.
ol.fit = ulam(
ol.model,
data=list(G3 = df$G3, Medu = df$Medu, failures = df$failures),
chains=4,cores=4)
We first just check Rhat values to make sure the model converged:
precis(ol.fit, depth = 2) %>% round(3)
## mean sd 5.5% 94.5% n_eff Rhat4
## bFE -0.192 0.116 -0.375 -0.006 2546.960 1.000
## bE 0.345 0.090 0.200 0.494 2356.090 0.998
## bF -0.861 0.134 -1.080 -0.656 2307.201 1.000
## thresholds[1] -2.702 0.200 -3.023 -2.397 1481.249 1.001
## thresholds[2] -2.641 0.197 -2.956 -2.340 1541.319 1.000
## thresholds[3] -2.074 0.164 -2.336 -1.815 1913.871 1.000
## thresholds[4] -1.351 0.137 -1.573 -1.139 2438.878 0.999
## thresholds[5] -0.296 0.114 -0.479 -0.117 2392.613 0.999
## thresholds[6] 0.619 0.117 0.436 0.809 2363.338 1.001
## thresholds[7] 1.477 0.139 1.260 1.704 2410.789 0.999
## thresholds[8] 2.750 0.217 2.422 3.112 2325.046 0.998
## thresholds[9] 4.241 0.400 3.654 4.924 2385.215 1.001
This looks OK.
As a quick posterior predictive check we compare the histogram of the
observed and predicted grades. We use the sim function
instead of the link function to get posterior predictions
on the scale of the ordered outcome variable.
set_par()
pp = sim(ol.fit)
obs.counts = cut(df$G3, breaks = seq(.5,10.5,1)) %>% table()
plot(0:10,c(0,obs.counts), 'S', xaxt = "n", ylim = c(0,110), ylab = "grade")
axis(1,at = (1:10)-.5, labels = 1:10, lwd = 2)
for (k in 1:250) {
pp.counts = cut(pp[k,], breaks = seq(.5,10.5,1)) %>% table()
lines(0:10, c(0,pp.counts), 'S', col = adjustcolor("blue",alpha = .25))
}
lines(0:10,c(0,obs.counts), 'S', lwd = 2)
We can see that the ordinal logistic model has no problems modeling this
multimodal response distribution.
Lets quickly compare the thresholds with thresholds from a thresholds-only model again:
post = extract.samples(ol.fit)
ulam_thresholds = colMeans(post$thresholds)
CI = apply(post$thresholds, 2, PI)
set_par(cex=1.5)
plot(simple_thresholds,
ulam_thresholds,
ylim = range(CI),
pch = 16, col = "blue")
arrows(x0 = simple_thresholds,
y0 = CI[1,], y1 = CI[2,],
col = "blue", length = 0)
abline(0,1,lty = 3, col = "grey")
Because the model now also estimates effects of education and past failures, the thresholds are set off a bit 2. Only if the covariates would be independent of grades would we expect to see identical thresholds.
Now lets look at the coefficients of interest:
coeffs = precis(ol.fit, pars = c("bE","bF","bFE"))
coeffs %>% plot()
coeffs %>% round(2)
## mean sd 5.5% 94.5% n_eff Rhat4
## bE 0.35 0.09 0.20 0.49 2356.09 1
## bF -0.86 0.13 -1.08 -0.66 2307.20 1
## bFE -0.19 0.12 -0.38 -0.01 2546.96 1
What do these results mean?3
It is a bit hard to interpret these results because they are on the log odds ratio scale and because it is not totally clear (to me) how to interpret interaction effects. Therefor, we use posterior predictions to understand the results better.
First, we create posterior predictions for all levels of maternal education and past fails:
Lets look at the data for which we simulate outcomes:
# create all combinations of Medu and failures
sim.data = expand.grid(
Medu = sort(unique(df$Medu)),
failures = sort(unique(df$failures)))
# use the function sim to generate posterior predictions
pp = sim(ol.fit,data = sim.data, n = 2000)
sim.data
## Medu failures
## 1 1 0
## 2 2 0
## 3 3 0
## 4 4 0
## 5 1 1
## 6 2 1
## 7 3 1
## 8 4 1
## 9 1 2
## 10 2 2
## 11 3 2
## 12 4 2
## 13 1 3
## 14 2 3
## 15 3 3
## 16 4 3
Here is a general overview over the model predictions:
Note the highly uncertainty in the posterior predictions!
mu = colMeans(pp)
CI80 = apply(pp,2,function(x) PI(x, prob = .8))
CI90 = apply(pp,2,function(x) PI(x, prob = .9))
set_par(cex = 1.5)
plot(0,type = "n", xlim = c(0.5,4.5),
ylim = range(CI90), xaxt = "n",
xlab = "previous fails",
ylab = "grade")
axis(1,at = 1:4, labels = 0:3)
for (Medu in 1:4) {
idx = sim.data$Medu == Medu
x = (1:4)+(Medu-2.5)/6
points(x,mu[idx], pch = 16, col = Medu)
arrows(x0 = x, y0 = CI80[1,idx], y1 = CI80[2,idx], length = 0, col = Medu, lwd = 2)
arrows(x0 = x, y0 = CI90[1,idx], y1 = CI90[2,idx], length = 0, col = Medu)
}
legend("topright", pch = 16, col = 1:4, legend = 1:4, title = "Medu", bty = "n")
This plot suggests several questions:
Above, we used sim(ol.fit) to easily obtain predicted
response values. However, these are a bit noisy, because thy are
simulated randomly given the response probabilities phi and the
thresholds. We can be more accurate by using only phi, and thresholds to
calculate response-probabilities and then calculate expected responses
by weighting the responses with these probabilities. This does not
involve simulation of individual level responses and is therefore less
noisy:
phi = link(ol.fit,data = sim.data)$phi
thresholds = extract_post_ulam(ol.fit)$thresholds
pp = matrix(NA, nrow = nrow(thresholds), ncol = nrow(sim.data))
for (k in 1:nrow(sim.data)) {
log_odds_ratio = apply(thresholds,2, function(x) x - phi[,k])
cum_prob = apply(log_odds_ratio,1, function(x) plogis(x)) %>% t()
cum_prob = cbind(cum_prob,1)
res_prob = cum_prob - cbind(0,cum_prob[,1:(ncol(cum_prob)-1)])
pp[,k] = apply(res_prob,1, function(x) sum(x*(1:10))) # expected grade
}
To answer the first question, “Do grades decrease from 0 through
3 previous fails?”, we calculate for 1,2,3 previous fails if
one less previous fail had lead to a better grade and average over these
3 comparisons:
The minimum number of previous fails is 0.
\[ \delta = \frac{1}{3}\sum^i_{1:3} G3(fails = i) - G3(fails = i-1) \] where \(\small G3(fails = i)\) is the expected grade give i previous fails.
posterior_hist = function(x,label, set_par = TRUE, xlim = NULL) {
if (set_par == T) set_par(cex = 1.5, mar=c(3,3,3,.5))
tlt = paste0(round(mean(x),2),", (",paste(round(PI(x),2), collapse = ", "),
")\n P(",label,">0) = ", round(mean(x>0),3))
hist(x, ylab = label, xlim = xlim, xlab = label,
main = tlt, cex.main = 1)
abline(v = 0, col = "red", lty = 3)
abline(v = (PI(x)), col = "blue", lwd = 2)
}
delta = rep(0,nrow(pp))
# pp is a matrix with posterior predictions
# for the data in sim.data
for (i in 1:3) { # i is number of previous fails
delta = delta +
pp[,which(sim.data$failures == i)] -
pp[,which(sim.data$failures == i-1)]
}
delta = delta / 3
posterior_hist( # un-hide previous code block for this function
delta,
"delta fails",
xlim = range(delta))
On average grades decreased by -0.96 for each previous fail.
To answer the second question, “Is higher maternal education associated with higher grades when there were 0 previous fails and with lower grades when there were 3 fails?”, we first calculate the effect of maternal education for students with 0 and 3 previous fails:
# calculate contrasts
delta_F0 = rep(0,nrow(pp))
delta_F3 = rep(0,nrow(pp))
# we loop over maternal education levels 2-4
# each time calculating the difference to the next lower level
for (i in 2:4) { # here, i is the maternal education level
delta_F0 = delta_F0 +
pp[,which(sim.data$Medu == i & sim.data$failures == 0)] -
pp[,which(sim.data$Medu == i-1 & sim.data$failures == 0)]
delta_F3 = delta_F3 +
pp[,which(sim.data$Medu == i & sim.data$failures == 3)] -
pp[,which(sim.data$Medu == i-1 & sim.data$failures == 3)]
}
# average effect of increasing Medu by 1 at 0 previous fails
delta_F0 = delta_F0 / 3
# average effect of increasing Medu by 1 at 3 previous fails
delta_F3 = delta_F3 / 3
We see that whereas higher maternal education is associated with better grades when the student had not failed the class yet, there is very weak evidence for the opposite effect for students who failed the class 3 times already. To see if the two effects are really different, we need to compare them explicitly, which we do next:
We calculate the difference between these two deltas:
interaction_eff = delta_F0-delta_F3
posterior_hist(
interaction_eff,
"delta(fails=0)-delta(fails=3)",
xlim = range(interaction_eff))
We are reasonably certain that there is a difference in slopes, but we can’t be sure.
The ordered probit uses the standard normal distribution for the latent variable. The only disadvantage I see with the ordered probit is that the coefficients cannot be interpreted as (log) odds ratios.
However, one can then interpret regression coefficients as changes on the level of the latent variable on the scale of a standard normal distribution. This can be more attractive, if one is primarily interested in the latent variable. If one is primarily interested in the relative frequency of the response categories the ordered logistic model is easier to interpret.
The basic idea for an ordered predictor with \(\small M\) levels is as follows:
This approach to ordered predictors can be used for any kind of outcome variable.
Over dispersion happens when the rate at which something happens is not fixed. E.g. if we count the number o drinks somebody had in the last year, this variable will be overdispersed, because different people have different rates at which they have drinks
Count data can be underdispersed when additional constraints influence behavior. E.g., if we are observing number of coffees bought per month at a shop that has a promotion where you get something free if you buy at least 10 coffees a month.
P. Cortez and A. Silva. Using Data Mining to Predict Secondary School Student Performance. In A. Brito and J. Teixeira Eds., Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008) pp. 5-12, Porto, Portugal, April, 2008, EUROSIS, ISBN 978-9077381-39-7. The data can be downloaded here↩︎
Not centering predictor variables leads to additional differences↩︎
remember that because thresholds are changed as \(\small\alpha - \beta X\) positive values of \(\small \beta\) mean that higher positive values x move the thresholds to the left and thus make higher-level responses more likely.↩︎